21 research outputs found

    Learning to sample from noise with deep generative models

    Full text link
    L’apprentissage automatique et spécialement l’apprentissage profond se sont imposés ces dernières années pour résoudre une large variété de tâches. Une des applications les plus remarquables concerne la vision par ordinateur. Les systèmes de détection ou de classification ont connu des avancées majeurs grâce a l’apprentissage profond. Cependant, il reste de nombreux obstacles à une compréhension du monde similaire aux être vivants. Ces derniers n’ont pas besoin de labels pour classifier, pour extraire des caractéristiques du monde réel. L’apprentissage non supervisé est un des axes de recherche qui se concentre sur la résolution de ce problème. Dans ce mémoire, je présente un nouveau moyen d’entrainer des réseaux de neurones de manière non supervisée. Je présente une méthode permettant d’échantillonner de manière itérative a partir de bruit afin de générer des données qui se rapprochent des données d’entrainement. Cette procédure itérative s’appelle l’entrainement par infusion qui est une nouvelle approche permettant d’apprendre l’opérateur de transition d’une chaine de Markov. Dans le premier chapitre, j’introduis des bases concernant l’apprentissage automatique et la théorie des probabilités. Dans le second chapitre, j’expose les modèles génératifs qui ont inspiré ce travail. Dans le troisième et dernier chapitre, je présente comment améliorer l’échantillonnage dans les modèles génératifs avec l’entrainement par infusion.Machine learning and specifically deep learning has made significant breakthroughs in recent years concerning different tasks. One well known application of deep learning is computer vision. Tasks such as detection or classification are nearly considered solved by the community. However, training state-of-the-art models for such tasks requires to have labels associated to the data we want to classify. A more general goal is, similarly to animal brains, to be able to design algorithms that can extract meaningful features from data that aren’t labeled. Unsupervised learning is one of the axes that try to solve this problem. In this thesis, I present a new way to train a neural network as a generative model capable of generating quality samples (a task akin to imagining). I explain how by starting from noise, it is possible to get samples which are close to the training data. This iterative procedure is called Infusion training and is a novel approach to learning the transition operator of a generative Markov chain. In the first chapter, I present some background about machine learning and probabilistic models. The second chapter presents generative models that inspired this work. The third and last chapter presents and investigates our novel approach to learn a generative model with Infusion training

    Objectives Matter: Understanding the Impact of Self-Supervised Objectives on Vision Transformer Representations

    Full text link
    Joint-embedding based learning (e.g., SimCLR, MoCo, DINO) and reconstruction-based learning (e.g., BEiT, SimMIM, MAE) are the two leading paradigms for self-supervised learning of vision transformers, but they differ substantially in their transfer performance. Here, we aim to explain these differences by analyzing the impact of these objectives on the structure and transferability of the learned representations. Our analysis reveals that reconstruction-based learning features are significantly dissimilar to joint-embedding based learning features and that models trained with similar objectives learn similar features even across architectures. These differences arise early in the network and are primarily driven by attention and normalization layers. We find that joint-embedding features yield better linear probe transfer for classification because the different objectives drive different distributions of information and invariances in the learned representation. These differences explain opposite trends in transfer performance for downstream tasks that require spatial specificity in features. Finally, we address how fine-tuning changes reconstructive representations to enable better transfer, showing that fine-tuning re-organizes the information to be more similar to pre-trained joint embedding models

    A surprisingly simple technique to control the pretraining bias for better transfer: Expand or Narrow your representation

    Full text link
    Self-Supervised Learning (SSL) models rely on a pretext task to learn representations. Because this pretext task differs from the downstream tasks used to evaluate the performance of these models, there is an inherent misalignment or pretraining bias. A commonly used trick in SSL, shown to make deep networks more robust to such bias, is the addition of a small projector (usually a 2 or 3 layer multi-layer perceptron) on top of a backbone network during training. In contrast to previous work that studied the impact of the projector architecture, we here focus on a simpler, yet overlooked lever to control the information in the backbone representation. We show that merely changing its dimensionality -- by changing only the size of the backbone's very last block -- is a remarkably effective technique to mitigate the pretraining bias. It significantly improves downstream transfer performance for both Self-Supervised and Supervised pretrained models

    PUG: Photorealistic and Semantically Controllable Synthetic Data for Representation Learning

    Full text link
    Synthetic image datasets offer unmatched advantages for designing and evaluating deep neural networks: they make it possible to (i) render as many data samples as needed, (ii) precisely control each scene and yield granular ground truth labels (and captions), (iii) precisely control distribution shifts between training and testing to isolate variables of interest for sound experimentation. Despite such promise, the use of synthetic image data is still limited -- and often played down -- mainly due to their lack of realism. Most works therefore rely on datasets of real images, which have often been scraped from public images on the internet, and may have issues with regards to privacy, bias, and copyright, while offering little control over how objects precisely appear. In this work, we present a path to democratize the use of photorealistic synthetic data: we develop a new generation of interactive environments for representation learning research, that offer both controllability and realism. We use the Unreal Engine, a powerful game engine well known in the entertainment industry, to produce PUG (Photorealistic Unreal Graphics) environments and datasets for representation learning. In this paper, we demonstrate the potential of PUG to enable more rigorous evaluations of vision models

    On the use of a pulsed-laser source in laboratory seismic experiments

    Get PDF
    International audienceReproduction of large-scale seismic exploration at laboratory-scale with controllable sources is a promising approach that could not only be applied to study small-scale physical properties of the medium, but also contribute to significant progress in wave-propagation understanding and complex media imaging at exploration scale via upscaling methods. We seek to characterize the properties of a laser-generated seismic source for new geophysical experiments at laboratory scale. This consists in generating seismic waves by pulsed-laser impacts and measuring the displacement wavefield by laser vibrometry. Parallel 2D/3D simulations using Discontinuous Galerkin discretization method and analytic predictions have been done to match the experimental data

    A Cookbook of Self-Supervised Learning

    Full text link
    Self-supervised learning, dubbed the dark matter of intelligence, is a promising path to advance machine learning. Yet, much like cooking, training SSL methods is a delicate art with a high barrier to entry. While many components are familiar, successfully training a SSL method involves a dizzying set of choices from the pretext tasks to training hyper-parameters. Our goal is to lower the barrier to entry into SSL research by laying the foundations and latest SSL recipes in the style of a cookbook. We hope to empower the curious researcher to navigate the terrain of methods, understand the role of the various knobs, and gain the know-how required to explore how delicious SSL can be

    Example-based wrinkle synthesis for clothing animation

    Get PDF
    This paper describes a method for animating the appearance of clothing, such as pants or a shirt, that fits closely to a figure's body. Compared to flowing cloth, such as loose dresses or capes, these types of garments involve nearly continuous collision contact and small wrinkles, that can be troublesome for traditional cloth simulation methods. Based on the observation that the wrinkles in closefitting clothing behave in a predominantly kinematic fashion, we have developed an example-based wrinkle synthesis technique. Our method drives wrinkle generation from the pose of the figure's kinematic skeleton. This approach allows high quality clothing wrinkles to be combined with a coarse cloth simulation that computes the global and dynamic aspects of the clothing motion. While the combined results do not exactly match a high-resolution reference simulation, they do capture many of the characteristic fine-scale features and wrinkles. Further, the combined system runs at interactive rates, making it suitable for applications where high-resolution offline simulations would not be a viable option. The wrinkle synthesis method uses a precomputed database built by simulating the high-resolution clothing as the articulated figure is moved over a range of poses. In principle, the space of poses is exponential in the total number of degrees of freedom; however clothing wrinkles are primarily affected by the nearest joints, allowing each joint to be processed independently. During synthesis, mesh interpolation is used to consider the influence of multiple joints, and combined with a coarse simulation to produce the final results at interactive rates

    Asymmetric jet correlations in p p^\uparrow scattering

    Full text link
    We propose that back-to-back correlations in azimuthal angle of jets produced in collisions of unpolarized with transversely polarized proton beams could be used to determine Sivers functions. The corresponding single-spin asymmetry is not power-suppressed, but is subject to Sudakov suppression. We present estimates of the asymmetry (without and with Sudakov effects) for RHIC at jet transverse momenta of ~10 GeV and show that it may reach a few per cent or more and could provide access to the gluon Sivers function.Comment: 14 pages, LaTeX, 3 figures as epsi files. With minor additions to first version. Accepted for publication in Phys. Rev.
    corecore